A Parallel Expressed Sequence Tag (EST) Clustering Program

نویسندگان

Kevin T. Pedretti

Todd E. Scheetz

Terry A. Braun

Chad A. Roberts

Natalie L. Robinson

Thomas L. Casavant

چکیده

This paper describes the UIcluster software tool, which partitions Expressed Sequence Tag (EST) sequences and other genetic sequences into “clusters” based on sequence similarity. Ideally, each cluster will contain sequences that all represent the same gene. If a näıve approach such as anNxN comparison (N is the number of sequences input) is taken, the problem is only feasible for very small data sets. UIcluster has been developed over the course of four years to solve this problem efficiently and accurately for large data sets consisting of tens or hundreds of thousands of EST sequences. The latest version of the application has been parallelized using the MPI (message passing interface) standard. Both the computation and memory requirements of the program can be distributed among multiple (possibly distributed) UNIX processes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Massively parallel expressed sequence tag clustering

Expressed Sequence Tag (EST) sequencing is a highly efficient technique that samples expressed genes required for most cellular functions. While this is a well-studied problem and many software tools have been developed, large-scale EST clustering has previously been pursued through incremental approaches, a pipeline of programs and manual efforts to achieve a modest degree of parallelism. Here...

متن کامل

SEAN: SNP prediction and display program utilizing EST sequence clusters

SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.

متن کامل

Evaluating the Significance of Global and Local Features in Expressed Sequence Tag: A Clustering Quality Perspective

Clustering of expressed sequence tag (EST) plays an important role in gene analysis. Alignment-based sequence comparison is commonly used to measure the similarity between sequences, and recently some of the alignment-free comparisons have been introduced. In this paper, we evaluate the role of global and local features extracted from the alignment free approaches i.e., compression-based method...

متن کامل

Efficient clustering of large EST data sets on parallel computers.

Clustering expressed sequence tags (ESTs) is a powerful strategy for gene identification, gene expression studies and identifying important genetic variations such as single nucleotide polymorphisms. To enable fast clustering of large-scale EST data, we developed PaCE (for Parallel Clustering of ESTs), a software program for EST clustering on parallel computers. In this paper, we report on the ...

متن کامل

Genetic Diversity and Population Structure of Iranian tulips revealed by EST-SSR and NBS-LRR Markers

The genus Tulipa L. (Liliaceae) comprises about 100 species and Iran is considered as one of the main origins of tulips. In this research, genetic diversity and population structure of 27 wild populations of tulips collected from Iran were studied by 15 highly polymorphic and reproducible expressed sequenced tag-simple sequence repeat (EST-SSR) markers and 8 nucleotide binding site (NBS)-enzyme...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

A Parallel Expressed Sequence Tag (EST) Clustering Program

نویسندگان

چکیده

منابع مشابه

Massively parallel expressed sequence tag clustering

SEAN: SNP prediction and display program utilizing EST sequence clusters

Evaluating the Significance of Global and Local Features in Expressed Sequence Tag: A Clustering Quality Perspective

Efficient clustering of large EST data sets on parallel computers.

Genetic Diversity and Population Structure of Iranian tulips revealed by EST-SSR and NBS-LRR Markers

عنوان ژورنال:

اشتراک گذاری